Crowdworker Filtering with Support Vector Machine
نویسندگان
چکیده
Crowdsourcing has been recognized as a possible technique to complement costly user studies, usability studies, relevance judgment for information retrieval studies, and training set build-up for automatic document classification. However, the quality of crowdworkers varies by diverse factors and we often cannot tell whether their answers are right or wrong immediately due to the lack of gold standard answers. In this paper, we present a machine-learning based crowdworker filtering technique that can be used to assess workers immediately after they finish their assigned tasks. A Support Vector Machine (SVM)-based crowdworker filter, called a Smart Crowd Filter (SCFilter), was used to predict the probability that each label is correct and identifies those crowdworkers that consistently provide answers that are unlikely to be correct. To verify the performance of the SCFilter, a bad worker detection simulation test and an experiment in an actual crowdsourcing environment at the Amazon Mechanical Turk (AMT) website were performed. In the simulation test, bad worker detection performance was assessed in terms of precision and recall. In the experiment at the AMT website, a statistically significant improvement was observed for automatic document classification.
منابع مشابه
Fast SFFS-Based Algorithm for Feature Selection in Biomedical Datasets
Biomedical datasets usually include a large number of features relative to the number of samples. However, some data dimensions may be less relevant or even irrelevant to the output class. Selection of an optimal subset of features is critical, not only to reduce the processing cost but also to improve the classification results. To this end, this paper presents a hybrid method of filter and wr...
متن کاملA Wavelet Support Vector Machine Combination Model for Daily Suspended Sediment Forecasting
Abstract In this study, wavelet support vector machine (WSWM) model is proposed for daily suspended sediment (SS) prediction. The WSVM model is achieved by combination of two methods; discrete wavelet analysis and support vector machine (SVM). The developed model was compared with single SVM. Daily discharge (Q) and SS data from Yadkin River at Yadkin College, NC station in the USA were used. I...
متن کاملApplication of Genetic Algorithm Based Support Vector Machine Model in Second Virial Coefficient Prediction of Pure Compounds
In this work, a Genetic Algorithm boosted Least Square Support Vector Machine model by a set of linear equations instead of a quadratic program, which is improved version of Support Vector Machine model, was used for estimation of 98 pure compounds second virial coefficient. Compounds were classified to the different groups. Finest parameters were obtained by Genetic Algorithm method ...
متن کاملIdentification areas with inundation potential for urban runoff harvesting using the support vector machine model
Rainfall-runoff from urban areas is one of the available water resources, which is wasted due to lack of attention and proper management. Besides, urban runoff excess of drains capacity causing many problems including inundation and urban environmental pollution. Therefore, harvesting this runoff can provide a part of the required water in urban areas, and also reduce flood and urban inund...
متن کاملFault diagnosis in a distillation column using a support vector machine based classifier
Fault diagnosis has always been an essential aspect of control system design. This is necessary due to the growing demand for increased performance and safety of industrial systems is discussed. Support vector machine classifier is a new technique based on statistical learning theory and is designed to reduce structural bias. Support vector machine classification in many applications in v...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011